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ABSTRACT 


There is a critical need to develop new educational technol- 
ogy applications that analyze the data collected by univer- 
sities to ensure that students graduate in a timely fashion 
(4 to 6 years); and they are well prepared for jobs in their 
respective fields of study. In this paper, we present a novel 
approach for analyzing historical educational records from 
a large, public university to perform next-term grade pre- 
diction; i.e., to estimate the grades that a student will get 
in a course that he/she will enroll in the next term. Accu- 
rate next-term grade prediction holds the promise for bet- 
ter student degree planning, personalized advising and au- 
tomated interventions to ensure that students stay on track 
in their chosen degree program and graduate on time. We 
present a factorization-based approach called Matrix Factor- 
ization with Temporal Course-wise Influence that incorpo- 
rates course-wise influence effects and temporal effects for 
grade prediction. In this model, students and courses are 
represented in a latent “knowledge” space. The grade of a 
student on a course is modeled as the similarity of their la- 
tent representation in the “knowledge” space. Course-wise 
influence is considered as an additional factor in the grade 
prediction. Our experimental results show that the proposed 
method outperforms several baseline approaches and infer 
meaningful patterns between pairs of courses within aca- 
demic programs. 
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next-term grade prediction, course-wise influence, temporal 
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1. INTRODUCTION 


Data analytics is at the forefront of innovation in several 
of today’s popular Educational Technologies (EdTech) [17]. 
Currently, one of the grand challenges facing higher educa- 
tion is the problem of student retention and graduation [19]. 
There is a critical need to develop new EdTech applications 


that analyze the data collected by universities to ensure that 
students graduate in a timely fashion (4 to 6 years), and they 
are well prepared for jobs in their respective fields of study. 
To this end, several universities deploy a suite of software 
and tools. For example, degree planners * assist students 
in deciding their majors or fields of study, choosing the se- 
quence of courses within their chosen major and providing 
advice for achieving career and learning objectives. Farly 
warning systems [27| inform advisors/students of progress, 
and additionally provide cues for intervention when students 
are at the risk of failing one or more courses and dropping 
out of their program of study. In this work, we focus on the 
problem of next-term grade prediction where the goal is to 
predict the grade that a student is expected to obtain in a 
course that he/she may enroll in the next term (future). 


In the past few years, several algorithms have been devel- 
oped to analyze educational data, including Matrix Factor- 
ization (MF) algorithms inspired from recommender system 
research. MF methods decompose the student-course (or 
student-task) grade matrix into two low-rank matrices, and 
then the prediction of the grade for a student on an untaken 
course is calculated as the product of the corresponding vec- 
tors in the two decomposed matrices [22, 11]. Traditional 
MF algorithms have shown a strong ability to deal with 
sparse datasets [14] and their extensions have incorporated 
temporal and dynamic information [12]. In our setting, we 
consider that a student’s knowledge is continuously being 
enriched while taking a sequence of courses; and it is im- 
portant to incorporate this dynamic influence of sequential 
courses within our models. Therefore, we present a novel 
approach referred as Matrix Factorization with Temporal 
Course-wise Influence (MFTCI) model to predict next term 
student grades. MFTCI considers that a student’s grade on 
a certain course is determined by two components: (i) the 
student’s competence with respect to each course’s topics, 
content and requirement, etc., and (ii) student’s previous 
performance over other courses. We performed a compre- 
hensive set of experiments on various datasets. The experi- 
mental results show that the proposed method outperforms 
several state-of-the-art methods. The main contributions of 
our work in this paper are as follows: 


1. We model and incorporate temporal course-wise in- 
fluence in addition to matrix factorization for grade 


‘http: //www.blackboard.com/mobile- 
learning/planner.aspx 
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prediction. Our experimental results demonstrate sig- 
nificant improvement from course-wise influence. 


2. Our model successfully captures meaningful course- 
wise influences which correlate to the course content. 


3. The learned influences between pairs of courses help 
in understanding pre-requisite structures within pro- 
grams and tuning academic program chains. 


2. RELATED WORK 


Over the past few years, several methods have been de- 
veloped to model student behavior and academic perfor- 
mance [2, 9], and they gain improvement of learning out- 
comes [21]. Methods influenced by Recommender System 
(RS) research [1], including Collaborative Filtering (CF) [18] 
and Matrix Factorization [13], have attracted increasing at- 
tention in educational mining applications which relate to 
student grade prediction [32] and in-class assessment pre- 
diction [8]. Sweeney et. al. [31, 30] performed an exten- 
sive study of several recommender system approaches in- 
cluding SVD, SVD-kNN and Factorization Machine (FM) to 
predict next-term grade performance. Inspired by content- 
based recommendation [20] approaches, Polyzou et. al. [23] 
addressed the future course grade prediction problem with 
three approaches: course-specific regression, student-specific 
regression and course-specific matrix factorization. More- 
over, neighborhood-based CF approaches [25, 4, 6] predict 
grades based on the student similarities, i.e., they first iden- 
tify similar students and use their grades to estimate the 
grades of the students with similar profiles. 


In order to capture the changing of user dynamics over time 
in RS, various dynamic models have been developed. Many 
of such models are based on Matrix Factorization and state 
space models. Sun et. al. [28, 29] model user preference 
change using a state space model on latent user factors, and 
estimate user factors over time using noncausal Kalman fil- 
ters. Similarly, Chua et.al. [5] apply Linear Dynamical Sys- 
tems (LDS) on Non-negative Matrix Factorization (NMF) 
to model user dynamics. Ju et. al. [12] encapsulate the 
temporal relationships within a Non-negative matrix for- 
mulation. Zhang et. al. [34] learn an explicit transition 
matrix over the latent factors for each user, and estimate 
the user and item latent factors and the transition matri- 
ces within a Bayesian framework. Other popular methods 
for dynamic modeling include time-weighting similarity de- 
caying [7], tensor factorization [33] and point processes [16]. 
The method proposed in this paper tackle the challenges of 
next-term grade prediction which relates to the evolvement 
of student knowledge over taking a sequence of courses. Our 
key contribution involves how we incorporate the temporal 
course-wise relationships within a MF approach. Addition- 
ally, the proposed approach learns pairwise relationships be- 
tween courses that can help in understanding pre-requisite 
structures within programs and tuning academic program 
chains. 


3. PRELIMINARIES 


3.1 Problem Statement and Notations 

Formally, student-course grades will be represented by a se- 
ries of matrices {Gi, Go, .... Gr} for T terms. Each row 
of G represents a student, each column of G: represents a 
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course, and each value in G;, denoted as Ics represents a 
grade that student s got on course c in term t (9s,c € (0, 4], 
Ws = 0 indicates that student s did not take the course c in 
term t. We add a small value to failing grade to distinguish 
0 score from such situation.). Student-course grades up to 
the tin term will be represented by Gt=yy_, Gi with size 
of n x m, where n is the number of students and m is the 
number of courses. Given the database of (student, course, 
grade) up to term (T — 1) (ie., G7~'), the next-term grade 
prediction problem is to predict grades for each student on 
courses they might enroll in the next term 7. To simplify 
the notations, if not specifically stated in this paper, we will 
use gs,< to denote gs... Our testing set is then (student, 
course, grade) triples in the T;, term, represented by matrix 
Gr. Rows from the grade matrices representing a student s 
will simply be represented as G(s, :) and the specific courses 
that student has a grade for in this row can be given by 
c € G(s,:). 


In this paper, all vectors (e.g., u! and Vc) are represented 
by bold lower-case letters and all matrices (e.g., A) are rep- 
resented by upper-case letters. Column vectors are repre- 
sented by having the transpose supscript', otherwise by de- 
fault they are row vectors. A predicted/approximated value 
is denoted by having a ~ head. 


4. METHODS 


4.1. MF with Temporal Course-wise Influence 
We consider the student s’ grade on a certain course c, de- 
noted as gs,-c, aS determined by two factors. The first factor 
is the student s’ competence with respect to the course c’s 
topics, content and requirement. This is modeled through 
a latent factor model, in which s’ competence is captured 
using a size-k latent factor us, c’s topics and contents are 
captured using a size-k latent factor v. in the same latent 
space as us. Then the competence of s over c is modeled 
by the “similarity” between u,; and v- via their dot product 
(ie., us ve). 


The second factor is the previous performance of student s 
over other courses. We hypothesize that if course c’ has a 
positive influence on course c, and student s achieved a high 
grade on c’, then s tends to have a high grade on c. Under 
this hypothesis, we model this second factor as a product 
between the performance of student on a previous “related” 
course where the pairwise course relationships are learned 
in our formulation. Note that we consider this pairwise 
course influence as time independent, i.e., the influence of 
one course over another does not change over time. How- 
ever, the impact from previous performance/grades can be 
modeled using a decay function over time. Taking these two 
factors, the estimated grade is given as follows: 


~ T 
Qs,c = Us Ve 


ae vrecnn dey A(c, C)9s,c! 


+e 

Gr—1(3,) 

A(T-1) (1) 
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in which A(c’,c) is the influence of c’ on c, Gr—i(s,:)/Gr-—2(s,: 


) is the subset of courses out of all courses that s has taken in 
the first/second previous terms, |Gr_1(s, :)|/|Gr—2(s,:)| is 
the number of such taken courses. e~°/e~** denote the 
time-decay factors. In Equation 1, we consider previous 
two terms. More previous terms can be included with even 
stronger time-decay factors. Given the grade estimation as 
in Equation 1, we formulate the grade prediction problem 
for term T as the following optimization problem, 


5 Al Pay Mera: 2 2 
gain, 5d (Gs.e — Gs.e) ab 5 (lUle + ||V|lz) 


+ T||All» + All Alle: 
s.t., A>0 


where U and V are the latent non-negative student factors 
and course factors, respectively; ||Al|. is the nuclear norm 
of A, which will induce an A of low rank; and ||Allz, is the 
é; norm of A, which will introduce sparsity in A. In addi- 
tion, the non-negativity constraint on A is to enforce only 
positive influence across courses. 


4.1.1 Optimization Algorithm of MFTCI 
We apply the ADMM [3] technique for Equation 2 by refor- 
mulating the optimization problem as follows, 


: 1 say 16D 2 2 
eae 5 Gee — Gs,c) + 5 (lle + ||V lz) 


Z|» + AllZelle 
pP 
FS (IA Zi\lz + \|A — Zale) 


LT | 


+p(tr(Ul (A — %1))) 
+p(tr(Uz (A — Z2))) 
s.t., A>0 


where Z; and Z2 are two auxiliary variables, and U; and U2 
are two dual variables. All the variables are solved via an 
alternating approach as follows. 


Step 1: Update U and V. Fixing all the other variables and 
solving for U and V, the problem becomes a classical matrix 
factorization problem: 


min 5 SoUfae — uve)? + 2S [uel +o lleell8) @) 


U,V 2 


S,c 


where fs,< = gs, — A(T — 1) — A(T — 2) (See Eq 1). The 
matrix factorization problem can be solved using alternating 
minimization. 


Step 2: Update A. Fixing all the other variables and solv- 
ing for A, the problem becomes 


: 1 % 
min 5) 0(gse— Gace)” + S(\|A - Zille + A — Zell) 


+p(tr(Ur (A — 2Z1))) + pltr(U2 (A — Z2))) 
s.t., A>0O 


Using the gradient descent, the elements in A can be up- 
dated as follows. 


A(ci, ej) = A(ci, ej) — Ir x [p(A(ci, ¢7) — Z1 (ci, ¢;)) 
+ p(A(ci, ej) — Z2(ci, ¢j)) + pU1 (Ci, C3) + pU2(Ci, c;) 
SS Gee = Gs,c;) 
8,0; 
[Gras 98% 


(if c; is taken in term T — 1) 
(if c; is taken in term T — 2)| 


(3) 


with projection into [0,-++co), where Ir is a learning rate. 


Step 3: Update Z, and Z,. For Z1, the problem becomes 


min T|[Z1l|« + EIA —Zille + ptr(Ui(A-Z))) (4) 


The closed-form solution of this problem is 
Zi = S2(A+U;) (5) 


where S,_(X) is a soft-thresholding function that shrinks the 
singular values of X with a threshold a, that is, 


So(X) = Udiag((X— a)4)V" (6) 


where X = USV' is the singular value decomposition of X, 
and 


(x). =max(e, 0). (7) 
For Z2, the problem becomes 


min Al|Zalle. + SIA — Zalle + pltrUz (A Z2)) (8) 


The closed-form solution is 


Z2=E (A+ U2) (9) 


oly 


where E.(X) is a soft-thresholding function that shrinks the 
values in X with a threshold a, that is, 


Eo(X) = (X — a,0)+ (10) 
where ()+ is defined as in Equation 7. 
Step 4: Update U, and Uz. U; and U2 are updated based 
on standard ADMM updates: 
U, =U, + (A- Z1); 


Uz = U2 + (A — Za) (11) 


In addition, we conduct computational complexity analysis 
of MFTCI and put it in Appendix. 


5. EXPERIMENTS 


5.1 Dataset Description 

We evaluated our method on student grade records obtained 
from George Mason University (GMU) from Fall 2009 to 
Spring 2016. This period included data for 23,013 transfer 
students and 20,086 first-time freshmen (non-transfer i.e., 
students who begin their study at GMU) across 151 majors 
enrolled in 4,654 courses. 


Specifically, we extracted data for six large and diverse ma- 
jors for both non-transfer and transfer students. These ma- 
jors include: (i) Applied Information Technology (AIT), (ii) 
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Table 1: Dataset Descriptions 


Major Non-Transfer Students Transfer Students 
FES #C  =—-#(5,C 7£5 #C  =—-#(S,C 
AIT 239 453 5,739 982 465 14,396 
BIOL | 1,448 990 33,527 | 1,330 833 22,691 
CEIE 393 642 9,812 227 305 4,538 
CPE 340 649 7,710 91 219 1,614 
CS 908 818 18,376 480 464 7,967 
PSYC | 911 874 22,598 | 1504 788 24,661 
Total | 4,239 1,115 97,762 | 4,614 1,019 75,867 


#8, #C and #S-C are number of students, courses and student-course 
pairs in educational records across the 6 majors from Fall 2009 to 
Spring 2016, respectively. 


Fall 2009 to Fall 2015 


Fall 2009 to Spring 2015 


Figure 1: Different Experimental Protocols 


Training set: 


Test set: Hi 


Fall 2009 to Fall 2014 


Biology (BIOL), (iii) Civil, Environmental and Infrastruc- 
ture Engineering (CEIB), (iv) Computer Engineering (CPE) 
(v) Computer Science (CS) and (vi) Psychology (PSYC). 
Table 1 provides more information about these datasets. 


5.2 Experimental Protocol 

To assess the performance of our next-term grade prediction 
models, we trained our models on data up to term T — 1 
and make predictions for term T’.. We evaluate our method 
for three test terms, i.e., Spring 2016, Fall 2015 and Spring 
2015. As an example, for evaluating predictions for term 
Fall 2015, data from Fall 2009 to Spring 2015 is considered 
as training data and data from Fall 2015 is testing data. 
datasets. Figure 1 shows the three different train-test splits. 


5.3. Evaluation Metrics 

We use Root Mean Squared Error (RMSE) and Mean 
Absolute Error (MAE) as metrics for evaluation, and are 
defined as follows: 


RMSE w= | eaceGr Gee — Gee)” 
|Gr| : 


VsceGr |9s.e — Gs,cl 
|Gr| 


where gs,- and gs,- are the ground truth and predicted grade 
for student s on course c, and G'r is the testing set of (stu- 
dent, course, grade) triples in the T;, term. Normally, in 
next-term grade prediction problem, MAE is more intuitive 
than RMSE since MAE is a straightforward method which 
calculates the deviation of errors directly while RMSE has 
implications such as penalizing large errors more. 


MAE= 


For our dataset, a student’s grade can be a letter grade (i.e. 
A, A-,..., F). As done previously by Polyzou et. al. [24] we 


define a tick to denote the difference between two consecu- 
tive letter grades (e.g., C+ vs C or C vs C-). To assess the 
performance of our grade prediction method, we convert the 
predicted grades into their closest letter grades and com- 
pute the percentage of predicted grades with no error (or 
0-ticks), within 1-tick and within 2-ticks denoted by Pcto, 
Pcti and Pctz, respectively. For the problem of course se- 
lection and degree planning, courses predicted within 2 ticks 
can be considered sufficiently correct. We name these met- 
rics as Percentage of Tick Accuracy (PTA). 


5.4 Baseline Methods 


We compare the performance of our proposed method to the 
following baseline approaches. 


5.4.1 Matrix Factorization 

Matrix factorization is known to be successful in predict- 
ing ratings accurately in recommender systems [26]. This 
approach can be applied directly on next-term grade predic- 
tion problem by considering student-course grade matrix as 
a user-item rating matrix in recommender systems. Based 
on the assumption that each course and student can be rep- 
resented in the same low-dimensional space, corresponding 
to the knowledge space, two low-rank matrices containing 
latent factors are learned to represent courses and students 
[30]. Specifically, the grade a student s will achieve on a 
course c is predicted as follows: 


Gs,c = + Ps+qe+UsVe (12) 


where py is a global bias term, ps (p € R”) and qe (q € 
R™) are the student and course bias terms (in this case, for 
student s and course c), respectively, and us (U € R**”) 
and ve (V € R**™) are the latent factors for student s and 
course ¢, respectively. 


5.4.2 Matrix Factorization without Bias (MF) 

We only considered the student and course latent factors to 
predict the next-term grades. Therefore, the grade a student 
s will achieve on a course c is calculated as follows: 


Gs,e = Us Ve (13) 


5.4.3 Non-negative Matrix Factorization (NMF) [15] 
We add non-negative constraints on matrix U and matrix V 
in Equation 13. The non-negativity constraints allows MF 
approaches to have better interpretability and accuracy for 
non-negative data [10]. 


6. RESULTS AND DISCUSSION 


6.1 Overall Performance 

Table 2 presents the comparison of Pcto, Pct; and Pct for 
non-transfer students for the three terms considered as test: 
Spring 2016, Fall 2015 and Spring 2015. We observe that the 
MFTCI model outperforms the baselines across the different 
test sets. On average, MFTCI outperforms the MF, MFo 
and NMF methods by 34.18%, 11.59% and 4.08% in terms of 
Pcto, 16.64%, 7.96% and 4.03% in terms of Pct1, and 2.10%, 
3.00% and 1.98% in terms of Pct2, respectively. We observe 
similar results for transfer students as well (not included 
here for brevity). 
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Table 2: Comparison Performance with PTA (%) 


Spring 2016 


Fall 2015 Spring 2015 


Methods Boh) Poti(t) Petal?) 


Pct;  Pect2| Pcto Pct, Pcte 


MF | 13.25 27.71 58.02) 12.05 
MFo} 16.52 31.65 57.46) 15.51 
NMF| 13.21) 27.04 57.18] 15.33 


26.63 58.89} 13.03 26.09 54.83 
30.03 55.64) 15.53 29.53 54.94 
30.12 56.15) 15.56 29.23 54.93 


MFTCI| 19.78 35.52 


61.44 | 19.71 


35.16 60.12/18.56 32.78 58.80 


i) “ft” indicates the higher the better. ii) Reported values of Pcto, Pct: and Pctz are percent- 
ages. iii) Best performing methods are highlighted with bold. 


Table 3 presents the performance of the baselines and MF TCI 
model for the three different terms of both non-transfer and 
transfer students using RMSE and MAE as evaluation met- 
rics. The MFTCI model consistently outperforms the base- 
lines across the different datasets in terms of MAE. In ad- 
dition, the results shows that MFo, NMF and MFTCI tend 
to have better performance for Spring 2016 term than Fall 
2015 term. Similar trend is observed between Fall 2015 term 
and Spring 2015 term. This suggests that MFTCTI is likely 
to have better performance with more information in the 
training set. 


6.2 Analysis on Individual Majors 

We divide non-transfer students based on their majors and 
test the baselines and MFTCI model on each major, sep- 
arately. Table 4 shows the comparison of Pcto, Pct: and 
Pct2 on different majors. The results show that MFTCI has 
the best performance for almost all the majors. Among all 
the results, MFTCI has the highest accuracy when predict- 
ing grades for PSYC and BIOL students for which we have 
more student-course pairs in the training set. 


6.3 Effects from Previous Terms on MFTCI 
In order to see the influence of number of previous terms 
considered in MFTCI, we run our model with only A(T — 1) 
in Equation 1. This method is represented as MFTCIp1. 
Figure 2 shows the comparison results of MAE for six sub- 
sets of data which are reported in Table 3, where “NTR” 
stands for non-transfer students and “TR” stands for trans- 
fer students. The results show that MFTCI consistently 
outperforms MFTCI,1 on all datasets. This suggests that 
considering two previous terms is necessary for achieving 
good prediciton results. Moreover, since we consider that 
the student’s knowledge is modeled using an exponential 
decaying function over time, we do not include the influence 
from the third previous term in our model as its influence 
for the grade prediction is negligible in comparison to the 
previous two terms. 


6.4 Visualization of Course Influence 

To interpret what is captured in the course influence matrix 
A (See Eq 1), we extract the top 20 values with the corre- 
sponding course names (and topics) for analysis. Figure 3 
and 4 show the captured pairwise course influences for CS 
and AIT majors, respectively. Each node corresponds to 
one course which is represented by the shortened course’s 
name. We can notice from the figures that most influences 
reflect content dependency between courses. For example, 
in the CS major, “Object Oriented Programming” course 
has significant influence on performance of “Low-Level Pro- 


0.70 


NTR Spring NTR Fall NTR Spring TR Spring TRFall TR Spring 
2016 2015 2015 2016 2015 2015 


Figure 2: 
MFTCI 


Comparison performance for MFTCI,1 and 


gramming” course (the former one is also the latter one’s 
prerequisite course); “Linear Algebra” and “Discrete Math- 
ematics” have influence on each other; “Formal Methods & 
Models” course has influence on “Analysis of Algorithms” 
course. In case of the AIT major, both “Introductory IT” 
course and “Introductory Computing” course have influence 
on “IT Problem & Programming” course; “Multimedia & 
Web Design” course has influence on both “Applied IT Pro- 
gramming” course and “IT in the Global Economy” course. 
GMU has a sample schedule of eight-term courses for each 
major in order to guide undergraduate students to finish 
their study step by step based on the level, content and 
difficulty of courses 7. Among the identified relationships 
shown in Figures 3 and 4 we found 17 and 13 of the CS and 
AIT courses influences in the guide map, respectively. The 
rest of the identified influences are among other general elec- 
tives but required courses (e.g., “Public Speaking” course), 
or specific electives pertaining to the major (e.g., “Research 
Methods” course). This shows that our model learns mean- 
ingful course-wise influences and successfully uses it to im- 
prove MF model. 


Figure 5 shows the identified course influences for the BIOL, 
CEIE, CPE and PSYC majors. These identified course-wise 
influences seem to capture similarity of course content. 


7. CONCLUSION AND FUTURE WORK 


We presented a Matrix Factorization with Temporal Course- 
wise Influence (MFTCI) model that integrates factorization 
models and the influence of courses taken in the preceding 
terms to predict student grades for the next term. 


We evaluate our model on the student educational records 
from Fall 2009 to Spring 2016 collected from George Ma- 


"http: //catalog.gmu.edu 
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Table 3: Comparison Performance with RMSE and MAE. 


Non-Transfer Students Transfer Students 

Methods] Spring 2016 Fall 2015 Spring 2015 Spring 2016 Fall 2015 Spring 2015 

RMSE MAE RMSE MAE RMSE MAE | RMSE MAE RMSE MAE RMSE MAE 
MF | 0.999 0.754 1.037 0.786 1.023 0.784 0.925 0.688 0.921 0.686 0.985 0.732 

MF» | 0.929 0.714 0.977 0.752 1.014 0.778 0.893 0.668 0.944 0.705 1.011 0.765 

NMF | 1.020 0.769 0.967 0.746 1.000 0.771 0.906 0.683 0.932 0.701 0.979 0.746 


MFTCI|0.928 0.685 0.982 0.717 1.012 0.750} 0.887 0.636 0.927 0.662 1.000 0.721 


Research Methods 


‘Analysis of Algorithms 


0.3526 


Reading & Writng 


Figure 3: Identified course influences for CS major 


Table 4: Comparison Performance for Different Majors 


Methods) AIT BIOL CEIE CPE CS PSYC 
MF) 18.71 18.00 15.99 12.99 15.98 20.18 
MF | 19.45 22.10 16.70 14.21 16.47 22.12 
NMF) 19.77 22.16 17.01 14.32 16.61 22.17 
MFTC]| 22.30 24.24 16.80 14.32 17.32 25.83 
MF) 37.95 35.43 31.47 27.86 31.53 39.41 
MFo) 37.21 39.68 31.87 27.97 30.51 39.63 
NMF) 36.79 39.74 31.67 27.19 30.43 39.36 
MFTC]| 39.64 40.87 32.38 27.53 31.78 42.29 
MF/| 67.02 67.78 58.66 52.28 56.91 71.01 
Det, MFo 66.17 67.54 58.35 50.72 56.24 67.74 
NMF) 66.70 67.54 58.55 51.17 56.17 67.79 
MFTC]] 66.70 68.25 58.76 52.94 58.18 68.29 


Pcto 


Pcti 


son University. The dataset in this study contains both 
non-transfer and transfer students from six different ma- 
jors. Our experimental evaluation shows that MFTCI con- 
sistently outperforms the different state-of-the-art methods. 
Moreover, we analyze the effects from previous terms on 
MFTCI, and we make the conclusion that it is necessary 
to consider two previous terms. In addition, we visualize 
the patterns learned between pairs of courses. The results 
strongly demonstrate that the learned course influences cor- 
relate with the course content within academic programs. 


In the future, we will explore incorporation of additional con- 
straints over the the pairwise course influence matrix, such 
as prerequisite information, compulsory and elective provi- 
sion of a course. We will explore using the course influence 


information to build a degree planner for future students. 
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APPENDIX 


A. COMPUTATIONAL COMPLEXITY ANAL- 


YSIS 


The computational complexity of MFTCI is determined by 
the four steps in the alternating approach as described above. 
To update U and V as in Equation 2 using gradient de- 
scent method via alternating minimization, the computa- 
tional complexity is O(niteruy(k X Ns,e +k x m+kxn)) = 
O(niteruy(kXxMs,c)) (typically ns,. > max(m,n)), where ns,c 
is the total number of student-course dyads, n is the num- 
ber of students, m is the number of courses, k is the latent 
dimensions of U and V, and niter,, is the number of itera- 
tions. To update A as in Equation 3 using gradient descent 
method, the computational complexity is upper-bounded by 
O(nitera (Nee X ae )), where nec is the number of course pairs 
that have been taken by at least one student, 


@s¢ is the av- 
erage number of students for a course, which upper bounds 
the average number of students who co-take two courses, 
and niter, is the number of iteractions. Essentially, to up- 
date A, we only need to update A(ci,c;) where c; and c; 
have been co-taken by some students. For A(c:,c;) where 
c; and c; have never been taken together, they will remain 
0. To update Z; as in Equation 4, a singular value decom- 
position is involved and thus its computational complexity 
is upper bounded by O(m*). To update Zz as in Equa- 
tion 8, the computational complexity is O(m?). To update 
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Figure 5: Identified course influences for different majors 
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